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Cross-Reference to Related Applications 

[0001) This application claims priority under 35 U.S. C. §1 19(e) to co-pending U.S. 
Provisional Patent Application No. 60/399,507 entitled "Extended Isomap for Pattern 
Classification," filed on July 29, 2002, the subject matter of which is incorporated by reference 
herein in its entirety. This application is also related to co-pending U.S. Patent Application No. 
10/201,429 entitled "Face Recognition Using Kernel Fisherfaces," filed on July 22, 2002. 

Technical Field 

[0002] The present invention relates generally to pattern classification and, more specifically, 
to representing images for pattern classification by extended Isomap using Linear Discriminant 
or Kernel Fisher Linear Discriminant. 

Background of the Invention 

[0003] Pattern classification (also know as pattern recognition) has received increased 
attention lately, since it can be used in various applications. For example, face recognition 
technology, which involves classification of face images, can be used in applications such as 
surveillance, security, advertising, and the like. 
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[0004] Pattern classification involves classifying data points in an input space where the data 
points correspond to the images or patterns to be classified. The data points typically lie on a 
complex manifold in the input space, and pattern classification is carried out by determining how 
close the data points are to reference data points corresponding to reference images. For 
example, in face recognition, a face image is typically a two-dimensional N by N array of 
intensity values and each face image can be represented in the input space as a vector having a 
dimension N in the input space having a dimension N . A set of face images corresponds to a 
set of data points (vectors) in the N dimensional input space, and the data points typically 
constitute a complex manifold in the N 2 dimensional input space. Face recognition involves 
determining how close the data points corresponding to the face images are to data points 
corresponding to reference face images. 

[0005] In order to determine how close the data points are to each other and to the reference 
data points in the input space for pattern classification, the nature of the manifold should be 
taken into consideration. That is, the geodesic distance (distance metrics along the surface of the 
manifold) between data points in the input space should be used to determine how close the data 
points are, because the geodesic distance reflects the intrinsic geometry of the underlying 
manifold. 

[0006] FIG. 1 A is a diagram illustrating an example of a complex manifold 100 on which 
data points of different classes are displayed in distinct shaded patches, and FIG. IB is a diagram 
illustrating data points sampled from these different classes shown in the manifold 100 of FIG. 
1 A. For a pair of points on the manifold 100 in FIG. 1 A, their Euclidean distance may not 
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accurately reflect their intrinsic similarity and consequently is not suitable for use in pattern 
classification. For example, referring to FIG. IB, the Euclidean distance between two data 
points (e.g., xi and X2) may be deceptively small in the three-dimensional input space, although 
the geodesic distance between the two data points (xj and X2) on the intrinsic two-dimensional 
manifold 100 is large. Therefore, the geodesic distance should be used to determine how close 
the data points (xi and X2) are on the manifold, since the geodesic distance reflects the intrinsic 
geometry of the underlying manifold 100. 

[0007] Recently, the Isomap method (also known as "isometric feature mapping") and the 
Locally Linear Embedding (LLE) method have been proposed for learning the intrinsic geometry 
of complex manifolds using local geometric metrics within a single global coordinate system. 
The conventional Isomap method first constructs a neighborhood graph that connects each data 
point on the manifold to all its A>nearest neighbors or to all the data points within some fixed 
radius s in the input space. For neighboring points, the input space Euclidean distance usually 
provides a good approximation of their geodesic distance. For each pair of data points, the 
shortest path connecting them in the neighborhood graph is computed and is used as an estimate 
of the true geodesic distance. These estimates are good approximations of the true geodesic 
distances if there are a sufficient number of data points on the manifold in the input space like 
FIG. IB. A conventional multi-dimensional scaling method is then applied to construct a low 
dimensional subspace that best preserves the manifold's estimated intrinsic geometry. 

[0008} The Locally Linear Embedding (LLE) method captures local geometric properties of 
the complex embedding manifolds by a set of linear coefficients in the high dimensional input 
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space that best approximates each data point from its neighbors in the input space. LLE then 
finds a set of low dimensional points where each point can be linearly approximated by its 
neighbors with the same set of coefficients that was computed from the high dimensional data 
points in the input space while minimizing reconstruction cost. 

(0009] Although the conventional Isomap method and the LLE method have demonstrated 
acceptable results in finding the embedding manifolds that best describe the data points with 
minimum reconstruction error as compared to, they fail to represent the images in an optimum 
way as to facilitate classification of those images. Furthermore, the conventional Isomap method 
and the LLE method assume that the embedding manifold is well sampled, which may not be the 
case in some classification problems such as face recognition since there are typically only a few 
samples available for each person. 

[0010] Therefore, there is a need for a method of optimally representing patterns such that 
the classification of such patterns are facilitated and the intrinsic geometry of the underlying 
manifold of the data points corresponding to the patterns are preserved. 

Summary of Invention 

[001 1] The present invention provides a method for representing images for pattern 
classification by extending the conventional Isomap method with Fisher Linear Discriminant 
(FLD) or Kernel Fisher Linear Discriminant (KFLD) for classification. The method of the 
present invention estimates the geodesic distance of data points corresponding to images for 
pattern classification, similar to conventional Isomap methods, and then uses pairwise geodesic 
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distances as feature vectors. According to one embodiment of the present invention, the method 
applies FLD to the feature vectors to find an optimal projection direction to maximize the 
distances between cluster centers of the feature vectors. 

[00121 In another embodiment of the present invention, the method applies KFLD to the 
feature vectors rather than FLD to find an optimal projection direction to maximize the distances 
between cluster centers of the feature vectors. 

[001 3J The present invention also provides a system for representing images for pattern 
classification by extending the conventional Isomap method with Fisher Linear Discriminant 
(FLD) or Kernel Fisher Linear Discriminant (KFLD) for classification. The system of the 
present invention includes a neighboring graph generation module for generating a neighboring 
graph for data points corresponding to the images, a geodesic distance estimation module for 
estimating the geodesic distance of data points, and a FLD projection module for generating 
feature vectors based on the pairwise geodesic distances and applying FLD to the feature vectors 
to find an optimal projection direction to maximize the distances between cluster centers of the 
feature vectors, according to one embodiment of the present invention. 

[00141 I* 1 another embodiment of the present invention, the system includes a KFLD module 
rather than a FLD module for generating feature vectors based upon the pairwise geodesic 
distances and applying FLD to the feature vectors to find an optimal projection direction to 
maximize the distances between cluster centers of the feature vectors, according to another 
embodiment of the present invention. 
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|0015] The present invention may be embodied in various forms, including computer 
program products, methods, and systems, special or general purpose computing devices or 
apparatuses, online services or systems, users interfaces, etc. 

Brief Description of the Drawings 

[001 61 The teachings of the present invention can be readily understood by considering the 
following detailed description in conjunction with the accompanying drawings. Like reference 
numerals are used for like elements in the accompanying drawings. 

[00171 FIG. 1 A is a diagram illustrating an example of a complex manifold on which data 
points of different classes are displayed in distinct shaded patches. 

[0018] FIG. IB is a diagram illustrating data points sampled from the different classes shown 
in the manifold of FIG. 1A. 

[0019] FIG. 2 is a flowchart illustrating a method of representing images for pattern 
classification by extended Isomap according to a first embodiment of the present invention. 

[0020] FIG. 3 is a flowchart illustrating a method of representing images for pattern 
classification by extended Isomap according to a second embodiment of the present invention. 

[0021] FIG. 4 is a diagram illustrating a system for representing images for pattern 
classification by extended Isomap according to a first embodiment of the present invention. 
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[0022] FIG. 5 is a diagram illustrating a system for representing images for pattern 
classification by extended Isomap according to a second embodiment of the present invention. 

[0023] FIG. 6 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention on a first set of test face images. 

[0024] FIG. 7 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention on a second set of test face images. 

[0025] FIG. 8A is a graph illustrating sample images projected by the conventional Isomap 
method. 

[0026] FIG. 8B is a graph illustrating sample images projected by the extended Isomap 
method according to the first embodiment of the present invention. 

[0027] FIG. 9 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention on a third set of test face images. 

Detailed Description of Embodiments 

[0028] FIG. 2 is a flowchart illustrating a method of representing images for pattern 
classification by extended Isomap according to a first embodiment of the present invention. The 



7 



23085/07 1 28/D0CS/I 360277.2 



extended Isomap method of FIG. 2 employs Fisher Linear Discriminant (FLD) combined with 
the conventional Isomap method to represent images for pattern classification. 

[00291 Referring to FIG. 2, a set of sample images for classification is obtained 202 in the 
input space. The sample images are represented in the form of vectors. Assuming that there is a 
set of m sample images {xj, . . . , x m } and that each sample image belongs to one of the c classes 
{2},...., Z c }» a neighboring graph of the sample images is generated 204. To this end, the 
neighbors of each sample Xj are determined on a low dimensional manifold M on which the data 
points corresponding to the sample images lie, based on distance metrics d x (xu Xj) in the input 
space X. Such distance metrics can be the Euclidean distance that is often used in face 
recognition. Such distance metrics can also be the tangent distance that has been shown to be 
effective in hand digit recognition. The distance between two data points in the input space 
provides a good approximation of the geodesic distance when the data points are neighboring 
points. Thus, the input space distance metrics can be utilized to determine whether or not two 
data points are neighbors. 

[0030] In one embodiment, the neighbors of each sample Xj are determined by the ^-Isomap 
method that uses a ^-nearest neighbor algorithm to determine neighbors. In another 
embodiment, the neighbors of each sample Xj are determined by the e-Isomap method that 
includes all data points within a predetermined radius 8 as neighbors. 

[0031] The neighborhood relationships between the data points corresponding to the sample 
images are represented in a weighted graph G in which: 
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dG(Xi, xj) - dx (xi ,Xj), if Xi and Xj are neighbors, and 

dc(xi,Xj) = oo, if otherwise (1) 

[00321 Next, the geodesic distances d M (xu xj ) between any pair of data points on the 
manifold M are estimated 206. For a pair of data points that are far away, their geodesic distance 
can be approximated by a sequence of short hops between neighboring data points. In other 
words, the geodesic distance ditfa, Xj) is approximated by the shortest path between the data 
points x, and x,-on the weighted graph G, which is computed by the Floyd- Warshall algorithm: 

dM(xu xj) = min{d G (x if xj), d G (xu x k )+d G (x kt xj)}>k* i,j (2) 

The shortest paths between any two data points are represented in a matrix D where Dy - d M (x it 

Xj) 

[00331 Then, each of the data points corresponding to the sample images are represented 208 
by a feature vector of its geodesic distance to any other points, and Fisher Linear Discriminant is 
applied 208 to the feature vectors to find an optimal projection direction for classification. 
Fisher Linear Discriminant determines a subspace in the low dimensional space where the class 
centers are separated as far as possible so that classification of the sample images are facilitated. 
The feature vector corresponding to the data point x» is an m-1 dimensional vector / = [D t jj 
where j - 1 , m and j * i 
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[0034] The between-class and within-class scatter matrices in Fisher Linear Discriminant are 
computed by: 



S M m TL^i <te-M)(M,-Mf (4) 



^=r M S^(4-A,)(4-A,) r (5) 



where n is the mean of all sample feature vectors^, p. is the mean of class Z„ 5 W; - is the 

covariance of class Z„ and TV, is the number of samples in class Z,. The optimal projection W FLD 
is chosen as the matrix with orthonormal columns which maximizes the ratio of the determinant 
of the between-class scatter matrix of the projected samples to the determinant of the within- 

« 

class scatter matrix of the projected samples: 



\W T S w\ 

W FLD = arg max . T ^ B = [iv, ,w 2 . . , w m ] (6) 



where {w ,]/ = l, 2, . . . . , m) is the set of generalized eigenvectors of S B and S w , corresponding to the 
m largest generalized eigenvalues = 1, 2 , m}. The rank of Sb is c - 1 or less because it is 

the sum of c matrices of rank one or less. Thus, there are at most c - 1 nonzero eigenvalues. 
Each data point x ( - corresponding to the sample image is represented by a low dimensional feature 
vector y,= W FL Dfi- 
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[0035] The computational complexity and memory requirements of the conventional Isomap 
method and the extended Isomap method according to the present invention are dominated by the 
calculation of the shortest paths between the data points. The Floyd- Warshall algorithm requires 
0(m ) operations and stores 0(m ) elements of estimated geodesic distances for straightforward 
implementations. On the other hand, the Multi-Dimensional Scaling (MDS) procedure used in 
the conventional Isomap method can be time-consuming as a result of its iterative operations to 
detect meaningful underlying dimensions that explain the observed similarities or dissimilarities 
(distances) between data points. The neighboring graph do faxj) provides better estimates of the 
intrinsic geodesic distance du fa, xj) as the number of data points increases. In practice, 
however, there may not be a sufficient number of samples, and so the geodesic distances da(xu 
xj) may not be good approximations of the intrinsic geodesic distances. Thus, the conventional 
Isomap may not be able to find intrinsic dimensionality from the data points and may not be 
suitable for pattern classification purposes. 

[0036] In contrast, the extended Isomap method utilizes the distances between the scatter 
centers (i.e., poor approximations are averaged out) and thus may perform well for pattern 
classification. While the conventional Isomap method uses MDS to find dimensions of the 
embedding manifolds, in the extended Isomap method of the present invention the 
dimensionality of the subspace is determined by the number of class (i.e., c - 1), which makes the 
computations for pattern classification much simpler than those required by conventional 
Isomap. 
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[0037] To deal with the singularity problem of within-scatter matrix S w that is often 
encountered in pattern classification, a multiple of the identity matrix may be added to the 
within-scatter matrix, i.e., S w + € /(where sis a small number), according to one embodiment of 
the present invention. This also makes the eigenvalue problem numerically more stable. 

[0038] FIG. 3 is a flowchart illustrating a method of representing images for pattern 
classification by extended Isomap according to a second embodiment of the present invention. 
For the data sets that are not linearly separable, the application of FLD in step 208 may be 
replaced by Kernel Fisher Linear Discriminant (KFLD) method 308. This is because KFLD 
conceptually projects data from the input space to a higher dimensional space in order to extract 
more representative features of the images prior to computing the optimal discriminant function 
to separate the data points. The method of FIG. 3 is identical to the method described in FIG. 2 
except that the feature vectors are projected using KFLD in step 308 rather than FLD. 

[0039] In KFLD analysis, each feature vector /(where/- = [Drf where 7 = 1, m and j ±1) 
obtained from the extend Isomap method is projected from the input space, R n , to <t>{f) in a high 
dimensional feature space R F i by a nonlinear mapping function (projection function): 

$>\R n -> R F >F > n (7) 

Each feature vector/ will be denoted as feature vector /hereinafter for the convenience of 
describing the equations involved with KFLD. The dimension F of the high dimensional feature 
space can be arbitrarily large. 
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[0040] Selection of a specific projection function is dependent upon the nature of the data 
and application and is often empirically determined. Numerous forms of projection functions 
Q>(x) can be used for the present invention. However, there are only a limited number of 
projection functions that are compatible with efficient and systematic computation. One 
approach for selecting a particular projection function 0(jc)is to select a projection function of 

which the dot product can be computed efficiently using a kernel function rather than by actually 
performing the dot product operation of the projection functions, since dot product operations of 
the projection functions are used frequently in the computation carried out for projecting the 
feature vectors from the high dimensional feature space to the low dimensional face image space 
and computationally intense. Thus, such approach finds kernel functions k(x,y ) that satisfy the 
following relation: 

k(x,y) = Q(x).Q(y) (8) 

[0041] Typically, computations using the kernel function k(x,y) can be carried out much 
more efficiently compared to computations using the dot product <t>(x) • <£( y) , because the 

computation using the kernel function k(x,y) depends on the n-dimensional input space (usually 
low) whereas the computation of O(jc) • <S>(y) depends on the dimensionality of ®(x) andO(y) , 

which is usually very high and can be infinite. The polynomial kernel ( k{x, y) - {x • y) d ) and 

2 2 

the Gaussian kernel (k(x, y) = e~**~™ , where a is the standard deviation of the Gaussian 
distribution from which x and y come from), are the most widely used kernel functions. 
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[0042J Note that the exact form of the projection functions ( O(x), <P(y) ) is completely 
dictated by the selected kernel function k(x,y). In fact, the exact closed forms of the projection 
functions need not be known if only the dot products of the projected samples, 0(x) • O(y) are 

used in the computation for projecting the face images from the high dimensional feature space 
to the lower dimensional face image space, since the kernel function k(x,y) can be used instead to 
perform such projection in a computationally efficient way. Thus, one advantage of using kernel 
functions is that an ^-dimensional face image can be projected to an /-dimensional feature space 
(/"is much larger than n\ which provides a richer feature representation, without knowing the 
exact closed form of the projection function. When the ^-degree polynomial kernel function is 



used, the dimensionality f of the high dimensional feature space is 



(d + n-V 
d 



v 



(00431 Denoting the within-class and between-class scatter matrices in the high dimensional 
space if by S% and S% , respectively, and applying FLD in the high-dimensional kernel space 

R F , it is necessary to find eigenvalues % and eigenvectors of the eigenvalue problem: 



SSw*~AS£w* (9). 



In the high dimensional feature space R F \ the following equations follow: 



«;=2X < 10 > 
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S? = IWI-fflfW-A')' CO 



/<,*=- !>(/) 02) 

"i tic, 



St =E».0'* - P°Y.M*-fi*) T (13) 



where //° is the total mean of vector <£(/), i.e., = . 



[0044] It follows that the optimal projection matrix in the high dimensional space if 



is: 



wj w = argmax w . — = [ w , , , w m ] (14) 



where {wf \ i ~ 1,2, ,m} is the set of generalized eigenvectors corresponding to the m largest 

generalized eigenvalues |i=l,2, ,m). "argmax in equation ( 1 4) finds vv° that 



maximizes the ratio that follows arg max. 



[0045] To avoid the singularity problem in computing , a small identity matrix I may be 
added to S w * in order to make it numerically stable, according to one embodiment of the present 
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invention. In other words, = + £ I , where / is an identity matrix whose dimensionality 

is the same as and e is a small real number, for example 0,001 according to one embodiment 

of the present invention. By adding a small real number to the diagonals of the within-class 
scatter matrix, none of the elements on the diagonal of the within-class scatter matrix can be 
zero, thus eliminating singularity problems. 

[0046J Consider a c-class problem (i.e., each sample belongs to one of the c classes) and let 
the r-th sample of class t and the s-th sample of class u be^ and./Ls, respectively (where class t 
has /, samples and class w has /„ samples). The kernel function can be defined as: 

{K)»=KhJJ=®(f*)i>(JJ 05) 

Let K be a m x m matrix defined by the elements {K^ where K tu is a matrix composed of 
dot products in the high dimensional feature space R , i.e., 

K «(*.)ttJ (16), 

where 

d 7 )- 

Here, K lu is an l ( x l u matrix, and Kisanmxm symmetric matrix. Also, matrix 2 is defined: 
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Z = (Z,),.w (•«) 

where (ZJ is an /, x // matrix with terms all equal to — , i.e., Z is an m x m block diagonal matrix. 

[00471 The between-class and within-class scatter matrices in the high dimensional feature 
space R in equation (13) and (10), respectively, become: 



S?=2»f) T (I?) 

=££«»(/, w,) r (20) 

1=1 j=i 



where //* is the mean of class i in R , and /, is the number of samples belonging to class i. 

From the theory of reproducing kernels, any solution w>° e R F must lie in the span of all training 
samples in R F , i.e., 



* -llflnWm) (2D 

It follows that the solution for (21) is obtained by solving: 

AKKa = KZKa (22) 
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Consequently, equation (14) can be written as: 



» \{M>*) T S?W*\ 



w 0 pt = argniax^ 



l(>v*) 7 o>*| 



=argmax -^^T (23) 



, where " argmax * " in equation (23) finds w° that maximizes the ratio that follows arg max. 

The extracted eigenvector u>° = [w®, ,w*] obtained in Equation (23) is called the Kernel 

Fisherface. 



[0048] The vectors 0(/) in the high dimensional feature space if can now be projected to a 

lower dimensional space spanned by using the Kernel Fisherface (eigenvector) w* . The lower 
dimensional space has a dimension lower than both the dimension of the input space and the high 

dimensional feature space. The projection of 0(/) onto the eigenvectors / becomes the 
nonlinear Fisher Linear Discriminant (FLD) corresponding to 0(/) : 



In this manner, it is possible to extract the Fisher Linear Discriminants using the kernel function 
without actually carrying out the burdensome computation that results from projecting the 
samples to a high dimensional feature space if. The low dimensional feature vector^ for each 
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for each feature vector is w 0 * <t>(f) obtained in equation (24). 

[0049] FIG. 4 is a diagram illustrating a system 400 for representing images for pattern 
classification by extended Isomap according to a first embodiment of the present invention. The 
system 400 includes a neighboring graph generation module 402, a geodesic distance estimation 
module 404, and a FLD projection module 406. The neighboring graph generation module 402 
receives sample images of classification and generates a neighboring graph of the data points 
corresponding to the sample images according to step 204 of FIG. 2. The geodesic distance 
estimation module 404 estimates the geodesic distance between data points on the neighboring 
graph according to step 206 of FIG. 2. The FLD projection module 406 represents each data 
point by a feature vector corresponding to its geodesic distance to other data points on the 
neighboring graph and applies FLD to the feature vectors according to step 208 of FIG. 2, to 
output feature vectors representing the sample images that are optimally projected for 
classification. 

10050] FIG. 5 is a diagram illustrating a system 500 for representing images for pattern 
classification by extended Isomap according to a second embodiment of the present invention. 
The system 500 includes a neighboring graph generation module 502, a geodesic distance 
estimation module 504, and a FLD projection module 506. The neighboring graph generation 
module 502 receives sample images of classification and generates a neighboring graph of the 
data points corresponding to the sample images according to step 304 of FIG. 3. The geodesic 
distance estimation module 504 estimates the geodesic distance between data points on the 
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neighboring graph according to step 306 of FIG. 3. The FLD projection module 506 represents 
each data point by a feature vector corresponding to its geodesic distance to other data points on 
the neighboring graph and applies KFLD to the feature vectors according to step 308 of FIG. 3 to 
output feature vectors representing the sample images that are optimally projected for 
classification. 

[0051] Two pattern classification problems, face recognition and handwritten digit 
recognition, were considered in order to test the performance of the extended Isomap method of 
the present invention in comparison with the conventional Isomap methods. These two problems 
have interesting characteristics and are approached differently. In the appearance-based methods 
for face recognition in frontal pose, each face image provides a rich description of one's identity 
and as a whole (i.e., holistic) is usually treated as a pattern without extracting features explicitly. 
Instead, subspace methods such as PCA (Principal Component Analysis) and FLD (Fisher Linear 
Discriminant) are applied to implicitly extract meaningful (e.g., PCA) or discriminant (e.g., 
FLD) features and then project patterns to a lower dimensional subspace for recognition. On the 
contrary, sophisticated feature extraction techniques are usually applied to handwritten digit 
images before any decision surface is induced for classification. 

[0052] The extended Isomap of the present invention was tested against the conventional 
Isomap methods, LLE, Eigenface (PCA), and Fisherface (FLD) methods using a first set of test 
face images, a second set of test face images, and a third set of test handwritten digit images 
retrieved from publicly available databases. The first and second sets of test face images had 
several unique characteristics. While the images in the first set of test face images contained 
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facial contours and vary in pose as well as scale, the face images in the second set of test face 
images had been cropped and aligned to include internal facial structures such as the eyebrow, 
eyes, nose, mouth and chin but do not contain facial contours. The face images in the first set of 
test face images were taken under well-controlled lighting conditions whereas the images in the 
second set of test face images were acquired under varying lighting conditions. For handwritten 
digit recognition problem, the extended Isomap method of the present invention was tested using 
the third set of test images of handwritten digits. 

[0053] FIG. 6 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention (using FLD with conventional Isomap) on the first set of test face images. The first set 
of test face images contained 400 images of 40 subjects. To reduce computational complexity, 
each face image was down-sampled to 23 x 28 pixels for experiments. Each face image was 
represented by a raster scan vector of the intensity values, and then the intensity values were 
normalized to be zero-mean unit-variance vectors. 

[0054) The tests were performed using the "leave-one-out" strategy (i.e., m-fold cross- 
validation). To classify an image of a person, that image was removed from the training set of 
(m - 1 ) images and the projection matrix was computed. All the m images in the training set 
were projected to a reduced space and recognition is performed based on a nearest neighbor 
classifier. The parameters, such as the number of principal components in the Eigenface (PCA) 
and the LLE methods, were empirically determined to achieve the lowest error rate by each 
method. For the Fisherface method (FLD) and the extended Isomap method of the present 
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invention, all samples were projected onto a subspace spanned by the c - 1 largest eigenvectors 
(where c is the number of classes). 

10055] Referring to FIG. 6, among all the face recognition methods, the extended Isomap 
method with the radius implementation (labeled "Ext Isomap (e)"is shown as achieving the 
lowest error rate and outperforming the Fisherface method (labeled "FLD") by a significant 
margin and the other methods (e.g., PC A, LLD). The two implementations of the extended 
Isomap (one with k-nearest neighbor (extended £-Isomap, labeled "Ext Isomap (neighbor)") and 
the other with e radius (extended e-Isomap, labeled "Ext Isomap (e)") for determining 
neighboring data points) consistently perform better than their counterpart conventional Isomap 
methods (&-Isomap labeled "Isomap (neighbor)" and e-Isomap labeled "Isomap (e)'\ 
respectively) with lower error rates in pattern classification by a significant margin. 

[0056] FIG. 7 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention (using FLD with conventional Isomap) on the second set of test face images. The 
second set of test face images contained 165 images of 1 1 subjects that include variation of facial 
expression and lighting. For computational efficiency, each face image was down-sampled to 
29 x 41 pixels. Similarly, each face image was represented by a centered vector of normalized 
intensity values. 

[0057] The tests were performed using the leave-one-out strategy, and the number of 
principal components were varied to achieve the lowest error rates for Eigenface (PCA) and LLE 
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methods. For the Fisherface method (FLD) and the extended Isomap method of the present 
invention, the samples were projected onto a subspace spanned by the c - 1 largest eigenvectors 
(where c is the number of classes). 

[00581 Referring to FIG. 7, both implementations of the extended Isomap method (one with 
k-nearest neighbor (extended fc-Isomap, labeled "Ext Isomap (neighbor)") and the other with e 
radius (extended e-Isomap, labeled "Ext Isomap (e)") for determining neighboring data points) 
perform better than their counterpart conventional Isomap methods (Isomap (neighbor) and 
Isomap (e), respectively) with lower error rates in pattern classification. Furthermore, the 
extended e-Isomap method (Ext Isomap (e)) performs almost as well as the Fisherface method 
(FLD, which is one of the best methods in the face recognition) while the conventional Isomap 
does not perform well on the second set of test face images. 

[0059] FIG. 8 A is a graph illustrating sample images in the second set of test face images 
projected by the conventional Isomap method, and FIG. 8B is a graph illustrating sample images 
in the second set of test face images projected by the extended Isomap method according to the 
first embodiment of the present invention. In FIG. 8 A, the test samples of the second set of test 
face images were projected onto the first two eigenvectors extracted by the conventional Isomap 
method, and in FIG. 8B, the test samples of the second set of test face images were projected 
onto the first two eigenvectors extracted by the extended Isomap method of the present invention 
as described in FIG. 2. The projected samples of different classes are smeared in the 
conventional Isomap method as shown in FIG. 8 A, whereas the samples projected by the 
extended Isomap method of the present invention are separated well as shown in FIG. 8B. 
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[0060] FIG. 9 is a graph illustrating the results of testing the method of representing images 
for pattern classification by extended Isomap according to the first embodiment of the present 
invention on the third set of test handwritten digit images. The third set of test handwritten digit 
images contained a training set of 60,000 examples and a test set of 10,000 examples. The 
images were normalized while preserving their aspect ratio, and each one was centered in a 28 x 
28 window of gray scales by computing the center of mass of the pixels and translating the 
image so as to position the center of mass at the center. 

[0061 ] Due to computational and memory constraints, a training set of 1 ,500 handwritten 
digit images was randomly selected from the third set of test handwritten digit images and a non- 
overlapping test set of 250 images was randomly selected for experiments from the third set of 
test handwritten digit images. The same experiment was carried out five times and the test 
parameters were varied to achieve the lowest error rates in each run. Each image was 
represented by a raster scan vector of intensity values without applying any feature extraction 
algorithms. 

[0062] Referring to FIG. 9, the averaged results of conventional £-Isomap (labeled "K- 
Isomap (neighbor)"), the extended A>Isomap (labeled "Ext K-Isomap (neighbor)"), conventional 
e-Isomap (labeled "Isomap (e)"), and the extended e-Isomap (labeled "Ext Isomap (e)") methods 
are shown. As shown in FIG. 9, the extended Isomap methods consistently outperform their 
counterpart conventional Isomap methods with lower error rates in pattern recognition in the 
experiments with the third set of test handwritten digit images. 
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[0063] Although the present invention has been illustrated as a method and system, it should 
be noted that the method of the present invention can be embodied in a computer program 
product recorded on any type of computer readable medium. It should be noted that the 
language used in the specification has been principally selected for readability and instructional 
purposes, and may not have been selected to delineate or circumscribe the inventive subject 
matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not 
limiting, of the scope of the invention, which is set forth in the following claims. 
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